Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning

نویسندگان

  • Manaal Faruqui
  • Ryan T. McDonald
  • Radu Soricut
چکیده

Morpho-syntactic lexicons provide information about the morphological and syntactic roles of words in a language. Such lexicons are not available for all languages and even when available, their coverage can be limited. We present a graph-based semi-supervised learning method that uses the morphological, syntactic and semantic relations between words to automatically construct wide coverage lexicons from small seed sets. Our method is language-independent, and we show that we can expand a 1000 word seed lexicon to more than 100 times its size with high quality for 11 languages. In addition, the automatically created lexicons provide features that improve performance in two downstream tasks: morphological tagging and dependency parsing.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data-Driven Graph Construction for Semi-Supervised Graph-Based Learning in NLP

Graph-based semi-supervised learning has recently emerged as a promising approach to data-sparse learning problems in natural language processing. All graph-based algorithms rely on a graph that jointly represents labeled and unlabeled data points. The problem of how to best construct this graph remains largely unsolved. In this paper we introduce a data-driven method that optimizes the represe...

متن کامل

Graph-Based Lexicon Expansion with Sparsity-Inducing Penalties

We present novel methods to construct compact natural language lexicons within a graphbased semi-supervised learning framework, an attractive platform suited for propagating soft labels onto new natural language types from seed data. To achieve compactness, we induce sparse measures at graph vertices by incorporating sparsity-inducing penalties in Gaussian and entropic pairwise Markov networks ...

متن کامل

Semi-Supervised Frame-Semantic Parsing for Unknown Predicates

We describe a new approach to disambiguating semantic frames evoked by lexical predicates previously unseen in a lexicon or annotated data. Our approach makes use of large amounts of unlabeled data in a graph-based semi-supervised learning framework. We construct a large graph where vertices correspond to potential predicates and use label propagation to learn possible semantic frames for new o...

متن کامل

Using Verb-Noun Patterns to Detect Process Inputs

We present the preliminary results of an ongoing work aimed at using morpho-syntactic patterns to extract information from process descriptions in a semi-supervised manner. The experiments have been designed for generic information extraction tasks and evaluated on detecting ingredients from cooking recipes in French using a large gold standard corpus. The proposed method uses bi-lexical depend...

متن کامل

یک چارچوب نیمه‌نظارتی مبتنی بر لغت‌نامه وفقی خودساخت جهت تحلیل نظرات فارسی

With the appearance of Web 2.0 and 3.0, users’ contribution to WWW has created a huge amount of valuable expressed opinions. Considering the difficulty or impossibility of manually analyzing such big data, sentiment analysis, as a branch of natural language processing, has been highly considered. Despite the other (popular) languages, a limited number of research studies have been conducted in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • TACL

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2016